Geometric Feature Extraction from Urdu Ligatures
نویسندگان
چکیده
ABSTRAC: -This research aims at the extraction of geometric features from Urdu ligatures. Though structural features are robust, its extraction and analysis is exceptionally complex and time-consuming task. The extraction and analysis is uncomplicated in case of the geometric features. Geometric features are language, script and font independent. There are twelve significant geometric features extracted from the ligature images. Specifically, these twelve features are the height, width, aspect ratio, density function, perimeter, area, perimeter to area ratio, horizontal projection profile, vertical projection profile, start point, end point and the slope between start and end point.
منابع مشابه
Robust Optical Recognition of Cursive Pashto Script Using Scale, Rotation and Location Invariant Approach
The presence of a large number of unique shapes called ligatures in cursive languages, along with variations due to scaling, orientation and location provides one of the most challenging pattern recognition problems. Recognition of the large number of ligatures is often a complicated task in oriental languages such as Pashto, Urdu, Persian and Arabic. Research on cursive script recognition ofte...
متن کاملLine and Ligature Segmentation in Printed Urdu Document Images
This paper presents a technique for segmentation of printed Urdu text images into lines and ligatures, a key pre-processing step in Urdu Optical Character Recognition (OCR) systems. Unlike classical projection profile based line segmentation methods, the proposed scheme successfully segments overlapping and touching lines. Once the lines are segmented, ligatures are extracted from each text lin...
متن کاملSegmentation-free optical character recognition for printed Urdu text
This paper presents a segmentation-free optical character recognition system for printed Urdu Nastaliq font using ligatures as units of recognition. The proposed technique relies on statistical features and employs Hidden Markov Models for classification. A total of 1525 unique high-frequency Urdu ligatures from the standard Urdu Printed Text Images (UPTI) database are considered in our study. ...
متن کاملOptical Character Recognition System for Urdu Words in Nastaliq Font
Optical Character Recognition (OCR) has been an attractive research area for the last three decades and mature OCR systems reporting near to 100% recognition rates are available for many scripts/languages today. Despite these developments, research on recognition of text in many languages is still in its early days, Urdu being one of them. The limited existing literature on Urdu OCR is either l...
متن کاملWord Segmentation for Urdu OCR System
This paper presents a technique for Word segmentation for the Urdu OCR system. Word segmentation or word tokenization is a preliminary task for understanding the meanings of sentences in Urdu language processing. Several techniques are available for word segmentation in other languages but not much work has been done for word segmentation of Urdu Optical Character Recognition (OCR) System. A me...
متن کامل